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O*. Abstract 

< 

*/"") ' We consider a fully distributed constrained convex optimization problem over a multi-agent (no 

central coordinator) network. We propose an asynchronous gossip-based random projection (GRP) 

algorithm that solves the distributed problem using only local communications and computations. We 

analyze the convergence properties of the algorithm for an uncoordinated diminishing stepsize and a 

constant stepsize. For a diminishing stepsize, we prove that the iterates of all agents converge to the 

same optimal point with probability 1. For a constant stepsize, we establish an error bound on the 

expected distance from the iterates of the algorithm to the optimal point. We also provide simulation 

£> ■ results on a distributed robust model predictive control problem. 

in 

I. Introduction 

A number of important problems that arise in various application domains, including dis- 
tributed control [E|, large-scale machine learning [|7], IBTI . wired and wireless networks [|9), [fTOl , 
ll22l . Il23l can be formulated as a distributed convex constrained minimization problem over a 
multi-agent network. The problem is usually defined as a sum of convex objective functions 
over an intersection of convex constraint sets. The goal of the agents is to solve the problem 
in a distributed way, with each agent handling a component of the objective and constraint. 
This is useful either when the problem data are naturally distributed or when the data are too 
large to be conveniently processed by a single agent. Common to these distributed optimization 
problems are the following operational restrictions: 1) a component objective function and a 
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constraint set is only known to a specific network agent (the problem is fully distributed); 2) 
there is no central coordinator that synchronizes actions on the network or works with global 
information; 3) the agents usually have a limited memory, computational power and energy; 
and 4) communication overhead is significant due to the expensive start-up cost and network 
latencies. These restrictions motivate the design of distributed, asynchronous, computationally 
simple and local communication based algorithms. 

The focus of this paper is the development and analysis of an efficient distributed algorithm 
whereby only a pair of agents exchanges local information and updates in an asynchronous 
manner. We propose a gradient descent with random projections which uses gossip scheme as a 
communication protocol. Random projection-based algorithms have been proposed in |JTT1 (see 
also its extended version lfT2~l ) for distributed problems with a synchronous update rule, and 
in |fl4l for centralized problems. Synchronous algorithms are often inefficient as they create 
bottlenecks and waste CPU cycles, while centralized approaches are inapplicable in situations 
where a central coordinator does not exist. Asynchronous algorithms based on a gossip scheme 
have been proposed and analyzed for a scalar objective function and a diminishing stepsize 1*261 . 
and a vector objective function and a constant stepsize [|24|. An asynchronous broadcast-based 
algorithm has also been proposed in [fTTl . The gradient-projection algorithms proposed in the 
papers 0, |[T5l . iTTvTl . fl24l . 11261 . IJ30l assume that the agents share a common constraint set 
and the projection is performed on the whole constraint set at each iteration. To accommodate 
the situations where the agents have local constraint sets, the distributed gradient methods with 
distributed projections on local constraint sets have been considered in lfT6l . 11291 (see also ll28l ). 
However, even the projection on the entire (local) constraint set often overburdens agents, such 
as wireless sensors, as it requires intensive computations. Furthermore, in some situations, the 
constraint set can be revealed only component-wise in time, and the whole set is not available 
in advance, which makes the existing distributed methods inadequate. Our proposed algorithm 
is intended to accommodate such situations. 

In our algorithm, we efficiently handle the projection at each iteration by performing a 
projection step on the local constraint set that is randomly selected (by nature or by an agent 
itself). For asynchrony, each agent uses either a diminishing stepsize that is uncoordinated with 
those of the other agents or a constant stepsize. Our main goals are to establish the convergence 
of the method with a diminishing stepsize, to estimate the error bound for a constant stepsize, 
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and to provide simulation results for the algorithm. 

To the best of our knowledge, there is no previous work on asynchronous distributed op- 
timization algorithms that utilize random projections. Finding probabilistic feasible solutions 
through random sampling of constraints for optimization problems with uncertain constraints 
have been proposed in flU, 0. Also, the related work is the (centralized) random projection 
method proposed by Polyak ll20l for a class of convex feasibility problems and the random 
projection algorithm lTT3l for convex set intersection problems. On a broader scale, the work in 
this paper is related to the literature on the consensus problem (see for example (8), fTTOll . QiD). 

The rest of this paper is organized as follows. In Section HI we describe the problem of 
interest, propose our gossip-based random projection algorithm, and state assumptions on the 
problem and the network. Section [In] states the main results of the paper, while in Sections ITVl 
and |Vl we provide the proofs of the results. We present the simulation results on a distributed 
model predictive control problem in Section |VI] and conclude with a summary in Section I VII I 
Appendix contains the proofs of the lemmas given in Section |IV] and Section IVl 
Notation. A vector is viewed as a column. We write x' to denote the transpose of a vector x. 
The scalar product of two vectors x and y is (x, y). We use 1 to denote a vector whose entries 
are 1 and ||x|| to denote the standard Euclidean norm. We write dist(x, X) for the distance of 
a vector x from a closed convex set X, i.e., dist(x, X) = min^^ \\v — x\\. We use V\ x [x] for 
the projection of a vector x on the set X, i.e., r\x[x] = argmin^g^ \\v — x\\ 2 . We use E[Z] to 
denote the expectation of a random variable Z. We often abbreviate with probability 1 as w.p.l. 

II. Problem Set-up, Algorithm and Assumptions 

We consider an optimization problem where the objective function and constraint sets are 
distributed among m agents over a network. Let an undirected graph G = (V, E) represent the 
topology of the network, with the vertex set V — {1, . . . , m} and the edge set E C V x V . Let 
N{i) be the set of the neighbors of agent i. i.e., N(i) = {j G V | {i, j} G E}. The goal of the 
agents is to cooperatively solve the following optimization problem: 

m m 

where fi : R. d — > M. is a convex function, representing the local objective of agent i, and X{ C M d 
is a closed convex set, representing the local constraint set of agent i. The function fi and the 
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set Xi are known to agent i only. 

We assume that problem dU) is feasible. Moreover, we assume each set Xi is defined as the 
intersection of a collection of simple convex sets. That is, X t can be represented as X^ = f] iGl . X?, 
where the superscript j is used to identify a component set and Ii is a (possibly infinite) set of 
indices. In some applications, Xi may not be explicitly given in advance due to online constraints 
or uncertainty. For example, consider the case when Xi is given by 

Xi = {x eR d \ (a + £,x) < b], 

where a E R d , b E R are deterministic and £ E M. d is a Gaussian random noise. In such a case, 
a projection-based distributed algorithm cannot be directly applied to solve problem (OQ) since 
|ij| is infinite and the projection of a point on the uncertain set Xi is impossible. However, a 
component X( can be realized from a random selection of £ and the projection onto the realized 
component is always possible. Our algorithm is based on such random projections. 

We propose a distributed optimization algorithm for problem ([I]) that is based on the random 
projections and the gossip communication protocol. Gossip algorithms robustly achieve consen- 
sus through sparse communications in the network. That is, only one edge {i, j} in the network 
is randomly selected for communication at each iteration, and agents i and j simply average their 
values. From now on, we refer to our algorithm as Gossip-based Random Projection (GRP). 

GRP uses an asynchronous time model as in [|4]|. Each agent has a local clock that ticks at 
a Poisson rate of 1. The setting can be visualized as having a single virtual clock that ticks 
whenever any of the local Poisson clock ticks. Thus, the ticks of the virtual clock is a Poisson 
random process with rate m. Let Z k be the absolute time of the fcth tick of the virtual clock. 
The time is discretized according to the intervals [Zk-i,Zk) and this time slot corresponds to 
our discrete time k. Let I k denote the index of the agent that wakes up at time k and J k 
denote the index of a neighbor of agent I k that is selected for communication. We assume that 
only one agent wakes up at a time. The distribution by which J k is selected is characterized 
by a nonnegative stochastic m x m matrix [II] ^ = tx^ that conforms with the graph topology 
G = (V, E), i.e., Tiij > only if {i, j} E E. At iteration k, agent I k wakes up and contacts one 
of its neighbors J k with probability iri k j k . 

Let Xi(k) denote the estimate of agent i at time k. GRP updates these estimates according 
to the following rule. Each agent starts with some initial vector Xj(0), which can be randomly 
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selected. For k > 1, agents other than I k and J k do not update: 

Xi (k)=Xi{k-l) for alii £ {4, J fc }. (2) 

Agents Jfc and </& calculate the average of their estimates, and adjust the average by using their 
local gradient information and by projecting onto a randomly selected component of their local 
constraint sets, i.e., for i E {-/&, Jk}'- 

Vl (k) = (x Ik (k-l)+x Jk (k-l))/2, 

Xi(k) = n p iW [vi(k) - «i(fc)V fi(vi(k))] , (3) 

i 

where ai(k) is a stepsize of agent i, and VLi(k) is a random variable drawn from the set Jj. 
The key difference between the work in lH5l . [fl6l . [|29l and this paper is the random projection 



step. Instead of projecting on the whole constraint set Xi, a component set X i is selected 
(or revealed by nature) and the projection is made on that set, which reduces the required 
computations per iteration. 

For an alternative representation of GRP we define a nonnegative matrix W(k) as follows: 

W(k) = I - l -{e Ik - e Jk )(e Ik - ejj for k > 1, 

where / is the m-dimensional identity matrix, e^ G M m is a vector whose zth entry is equal to 1 
and all other entries are equal to 0. Each W(k) is doubly stochastic by construction, implying 
that E[H / (A;)] is also doubly stochastic. Using W(k), algorithm (0)-© can be equivalently 

represented as 

in 
Vi (k) = YjW{k)] ijXj {k-l), (4a) 

Pi (k) = n n i(k) [vi(k) - oti(k)Vf(yi(k))] - «<(*), (4b) 

i 

Xi(k) = Vi(k) +Pi(k)x{ie{i k , Jk }}, (4c) 

where \s is the characteristic-event function, i.e., x^ = 1 if c? happens, and xs = otherwise. 
From here onward, we will shorten E[W(A;)] = W since the matrices W(k) are identically 
distributed. Let A denote to the second largest eigenvalue of W. If the underlying communication 
network is connected, the incidence graph associated with the positive entries in the matrix W 
is also connected, with a self- loop at each node. Hence, we have A < 1. 
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In the convergence analysis of the algorithm (|4ab-(|4cl). we use two different choices of stepsize. 
For a diminishing stepsize, we use 0(i(k) = p-j^v where Ti(k) denotes the number of updates 
that agent i has performed until time k. Since every agent i has access to a locally defined 
quantity Ti(k), the stepsize of agent % is independent of every other agent and no coordination 
is needed for its update. Another choice that we consider is a constant deterministic stepsize 
(Xi(k) = ai > 0. 

We next discuss our assumptions, the first of which deals with the network. 

Assumption 1: The underlying graph G = (V, E) is connected. Furthermore, the neighbor 
selection process is iid, whereby at any time agent % is chosen by its neighbor j E jV(i) with 
probability -k^ > (7r,j — if j ^ J\f(i)) independently of the other agents in the network. 

We use the following assumption for the functions fi and the sets X(. 

Assumption 2: Let the following conditions hold: 

(a) The sets Xf, j E Ii, are closed and convex for every i E V. 

(b) Each function fi : R d — > M. is convex over R d . 

(c) Each function fi is differentiable and has Lipschitz gradients with a constant L { over M d , 

\\Vfi(x) - Vfi(y)\\ < Li\\x - y\\ for all x, y E R d . 

(d) The gradients Vfi(x) are bounded over the set X, i.e., there is a constant Gf such that 

||V/i(x)|| < G f for all x E X and all % E V. 

For example, Assumption |2jd) is satisfied when the constraint set X is compact. 

The next assumption states set regularity, which is crucial in our convergence analysis. 
Assumption 3: There exists a constant c > such that for all % E V and x E W 1 , 

dist 2 (x,;t) <cE [dist 2 (x,;tf l(fc) )|^(£),£e [l,k),£eV 

Assumption [3] holds if each set X? is affine, or the constraint set X has a nonempty interior. 

III. Main Results 

In this section, we state the main results of this paper. The detailed proofs of these results are 
given later on in Sections |IV] and |V] We introduce the following notation regarding the optimal 
value and optimal solutions of problem (OQ): 

f* = min/(x), X* = {xEX\ f(x) = /*}. 

xdX 
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Our first result shows the convergence of the method with probability 1 for a diminishing stepsize. 

Proposition 1 (Convergence w.p.l): Let Assumptions [Ql3] hold. Assume that problem (Q3 has 
a nonempty optimal set X* and the iterates {xi(k)} are generated by algorithm (|4aT>-d4cT> with 
oti(k) = 1/Tj(fc). Then, the sequences {xi(k)}, for i £ V, converge to some random point x* 
in the optimal set X* with probability 1, i.e., ]ivni.^. 00 Xi(k) = x* w.p.l for all i £ V. 

Proposition \T\ states that the agents asymptotically reach an agreement on a random point in the 
optimal set X*. To get some insights into the convergence rate, we consider a constant stepsize 
oti{k) = cti > for % £ V, and establish a limiting error bound assuming that each /j is strongly 
convex over the set X with a constant a: L > 0. The bound will depend on the probabilities 
of agent updates, which we formally describe as follows. Let Ei(k) = {i £ {Ik, Jk}} be the 
event that agent i updates at time k, and let 7; be the probability of the event Ei(k). Then, 
^ i= m + m Y^jeAf(i) ^ii ^ or a ^ * e ^ wnere %« > is the probability that agent i is chosen by 
its neighbor j to communicate. 

For the constant stepsize, we will also use the following assumption. 

Assumption 4: Let the convexity requirement for fc in Assumption (2b) be replaced by the 
requirement that each function fi is strongly convex with a constant cr^ > over M. d . In addition, 
assume that the stepsizes cii are such that for all i £ V: 

(a) 0<a i a l -A(2 + c)a^L 2 l < 1; 

(b) < 7^ (ctiGi - 4(2 + c)ajLf) ^ < 1, where A 7 „ = max^c^} - min i {7 j a i }. 

We have the following result for the asymptotic error bound. 

Proposition 2 (Error bound): Let Assumptions [T]|4]hold. Then, for the iterate sequences {xi(k)}, 
i £ V, generated by algorithm (l4aT>-d4cT> with a constant stepsize oti{k) = cti > 0, we have 

^c A i 



m 1 / 

limsup-VEtHx^AO-xl 2 ] < -47a 2 Gj - 



, 2(1 + c) I +-A 7Q GJ, 



where a;* is the (unique) solution to problem (OQ), 

q = min{7jpj} - A 7Q /m, p- L = diVi - 8(1 + c)a?I%, for all i £ V, 

i 

7 = maxi 7^, a = maxj ch, and C = 4 [ j^^^y^ + lj • 

Proposition |2] provides an asymptotic error bound for the average of the expected distances 
between the iterates of GRP algorithm and the optimal solution x*. The first error term is an 
error term due to a combined effects of the distributed computations over the network, which 
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is controlled by the spectral gap 1 — y/\ of the matrix W, and the non-diminishing stepsize 
(common to gradient descent algorithms). The last term involves an error term A ja G 2 due to 
the different values for 7 i a i for different agents. We note that if 7j«j = v for some v 6 (0, 1) 
and for all agents i, then this error would be 0. The condition ^ati = v will hold when the graph 
is regular (all ji are the same) and all agents use the same stepsize c^ = a. There is another 
more interesting case when 7^0:; = v holds for all i, which is as follows: the agents that update 
more frequently use a smaller stepsize, while the agents that update les frequently use a larger 
stepsize, i.e., if ^ > jj then aj < ocj, and vice versa. 

IV. Convergence Analysis 

In this section, we prove Proposition [Q We start with some basic results from the literature, 
which will be used later on. The analysis relies on the nonexpansive projection property (see [|2] 
for its proof), stating that: for a closed convex set X C M. d , the projection mapping U x ■ K d — > X 
is strictly nonexpansive, 

\\V\ x [x\-y\\ 2 <\\x-y\\ 2 -\\n x [x\-x\\ 2 for all x € R d and for all y e X, (5) 

and, therefore, it is continuous. As an immediate consequence of the preceding relation, we have 

||n*M -rUr[v]|| < ||x-u|| for all x, v e R d . (6) 



We also make use of the following convergence result (see [U9l Lemma 11, p. 49-50]). 

Lemma 1: Let {v^}, {«&}, {a^} and {b k } be non-negative random sequences such that E[t^ +1 | 
F k ] < (1 + a k )v k -u k + b k for all k > w.p.l, where F k = {{^,^,^,6^,0 < i < k}. If 
J2T=o a k < °° an d J2T=o bk < oo w.p.l, then lim^oo v k = v for a random variable v > w.p.l, 

and Y,T=0 U k < °° w -p-i- 

The GRP algorithm has three random elements: random gossip communications, random 

stepsizes and random projections, which are all independent. They will be handled as follows. 

Random Gossip Communications: At each iteration of the algorithm, a gossip communication 

matrix W(k) is realized independently of the past. In the analysis, we can work with the expected 

matrix W instead of W(k) due to the following properties of the matrices W(k): (1) Each W(k) 

is a symmetric projection matrix; hence WW = W and (W — ^H') = W — —11'. (2) Since 

W is doubly stochastic, the largest eigenvalue of W is 1 . Therefore, the largest eigenvalue of the 
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<A|b|| 2 . (7) 



matrix W — ^11' is the same as A (the second largest eigenvalue of W). These two properties 
immediately yield the following relation for any y e M m , 

W-—ll')y 

m J 

Furthermore, in view of the connectivity of the underlying graph (Assumption d), we have A < 1. 
Random Stepsizes: Since the underlying communication graph G = (V, E) is static, due to the 
gossip-based communications, the random diminishing stepsize oci(k) = p-j^y exhibits the same 
behavior as the deterministic stepsize l/k in a long run. This enables us to handle the cross 
dependencies of the random stepsizes and the other randomness in the GRP method. 
Random Projections: A projection error is incurred at each iteration of the algorithm since the 
GRP projects onto one randomly selected set from the collection defining the overall constraint 
set X. However, due to the regularity property in the expected sense, as given in Assumption O 
the random projections drive the iterates toward the constraint set X w.p.l (cf. Lemma [3). 

Our convergence analysis is guided by the preceding observations, and it is constructed 
along the following main lines: (1) the estimates Vi(k) are approaching the constraint set X 
asymptotically w.p.l; (2) the distances ||vi(fc) — a?i(fc)|| diminish with probability 1; and (3) the 
agents' estimates Xi(k) eventually arrive at a consensus point that lies in the optimal set X*. 
For this, we first establish a basic relation for the iterates of the GRP algorithm (Lemma |2]), 
which allows us to apply the (almost) supermartingale convergence result of Lemma HI by letting 
v k = YlT=i \\ x i(k) — x*\\ 2 for some optimal point x*. To accommodate the use of the (almost) 
supermartingale convergence result, we use several auxiliary results. 

A. Basic Results for GRP 

We define the history of the algorithm as follows. Let Tk be the cr-algebra generated by the 
entire history of the algorithm up to time k inclusively, i.e., for all k > 1, 

7 k = {x l (0); teV}U {I e , J £l ni(£); % e {h, Ji}, l<i< k}, 

and T = {xi(0);i e V}. 

We provide several important relations for GRP method. At first, we provide a relation for 
the iterates obtained after one step of algorithm (|4aT >-(l4cb and a point in the constraint set X. 
The lemma relies on the fact that the event that agent i updates at any time is independent of 
the past. 
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Lemma 2: [Basic Iterate Relation] Let Assumptions [2][3] hold. Let {x,i(k)} be the iterates 
generated by the algorithm (l4ab-{|4cT). Then, for any q E (0, 1/2) there is a sufficiently large k, 
such that with probability 1, for all x E X, k > k and i G V, 

2 



E 



Xi (k) -x\\ 2 \ .F fc _i] < (l + 0) E [\\ Vi {k) - x\\ 2 | JW] _ _E [/<(*(*)) - /-(x) | .7^] 

-^E [dist 2 K(fc), X) | J- fc _J + 4^ + 4^E [\\ Zi {k) -x\\ 2 | JW]. 
4c L J 0~ q ki~ q L J 

where Zi(k) = r\x[vi(k)], a>j > are some constants, c is the scalar from Assumption |3] and 7* 

is the probability that agent % updates. 

The proof of the lemma is in Appendix |A] where the constants Oj are also defined. 

In the next lemma, we show that the distances between the estimates V{(k) and the constraint 
set X go to zero for all i, with probability 1 as k — > 00. We also show that the errors Si(k) = 
Xi(k) — Vi(k) converge to zero with probability 1. 

Lemma 3: [Projection Error] Let Assumptions EH3] hold. Then, with probability 1, we have 

(a) J2T=i E [dist 2 (t?j(A;), X) \ Fk-i] < 00 and lim^oo dist(vi(k), X) = for all i G V. 

( b ) 2~2kLi E [ll e «( fc )ll 2 I Fk-i] < 00 and lirn^oo ||ej(A;)|| = for all i G V, where ei(k) = 
Xi(k) — Vi(k) for all i E V and k > 1. 

Lemma|Ja) and Lemma [3tb) imply that lim fc _ s . 00 dist 2 (a;j(/i;), X) = with probability 1 for 
all i G V. However, the lemma does not imply that the sequences Xt(k) converge, nor that 
their differences ||xj(fc) — Xj-(A;)|| are vanishing. A step toward this is provided by the following 
lemma, which shows a relation for the agent disagreements on the vectors Vi(k). 

Lemma 4: [Disagreement] Let Assumptions [QI2] hold. Let {vi(k)} be generated by method 
(I4al)-(l4cl) with oti(k) = 1/Ti(k) and r,(/c) being the number of updates that agent i has performed 
until time k. Then, for v(k) = ^Yl?=i v i( k ) we nave Efcli l^[\\ v i( k ) ~ v(k)\\ \ J^-i] < 00 
with probability 1 for all i E V. 
The proofs of Lemma |3] and Lemma 0] are, respectively, in Appendix |B] and Appendix O 

B. Proof of Proposition [7] 

We assert the convergence of method (I4al)-(l4~cl using the lemmas established in Section HY-AI 
Note that Lemma [3] allows us to infer that Vi(k) approaches the set X, while Lemma fallows us to 
claim that any two sequences {vt(k)} and {vj(k)} have the same limit points with probability 1. 
To claim the convergence of the iterates to an optimal solution, it remains to relate the limit 
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points of {vi(k)} and the solutions of problem (0Q). This connection is provided by the iterate 
relation of Lemma |2] supported by the convergence result in Lemma [Q We start the proof by 
invoking Lemma [2] stating that for any q G (0, 1/2), and all x G X and k > k, w.p.l we have 

E [||zi(A;) - x\\ 2 | J- fc -i] < (l + |i) E [\\ Vi {k) -x\\ 2 \ J- fc _J - ^E [/<(*(*)) - £(*) | J^x] 
-^E [dist 2 fa(A;),*) | J-fe-J + 4^ + 4^E [||^(fc) -£|| 2 | JW], 

where Zj(A;) = n^[^(fc)]. Since ||^(A;) — x\\ < \\vi(k) — x\\ by the non-expansive projection 
property in Eq. ©, we obtain 

E [\\x t (k) - xf | J- fe _J < A + *l.\ E [||^(A,) - xf | J- fc _J 

- ^E [/«(*(*)) - / 4 (x) | .Ffc-i] - ^E [dist 2 (^), *) | J- fc -i] + 4 s -, (8) 

k 4c ^2-9 

where 04 = 01 + 03. Further, by the definition of Vi(k) in (l4al) . the convexity of the squared-norm 
function and the doubly stochastic matrices W(k), we have 

m mm m 

E £[\\vi{k) - x*\\ 2 1 j-fc.i] < e E w n\M k - 1) - ^n 2 = E im* - x ) - x *n 2 - (9) 

Summing relations in ([8]) over i and using Eq. ©, yields w.p.l for all x G X and all k > k, 



in s \ m 

^E[||x 4 (fc)-x|| 2 | J- fc _J < (l + -£- E M*" 1 ) " 



ill 2 



111 



Recall that /(x) = YlT=i fi( x )- L et z(A;) — m Y^Li z i(k)- Using z(k) and /, we can rewrite 
the term fi(zi(k)) — fi(x) as follows: 

m m 

J2(fi(zm) - /,(*)) = £(/,(*(*)) - /i(«(*))) + (/(*(*)) - /(*))■ (id 

Furthermore, using the convexity of each function /*, we obtain 

mm m 

£>(*(*)) - /#))) ^ *E(Vfi(z(k), Zi(k) - z(k)) > - E UV/iC*(A0)|| ||*(Ar) - z(*)||. 

i=l j=l i=l 

Since z(fc) is a convex combination of Zj(/c) G A\ it follows that z(fe) G A". Using z(/c) G A" 
and the uniform bound Gf for the norms ||V/j(x)|| on the set X (Assumption [ltd)) we obtain 

m m 

]£(/«(*(*)) - /«(*(*))) > "G/ E U*(*) - «(*)H- ( 12 ) 

i=l j=l 
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We next consider the term \\zi(k) — z{k)\\, for which by using z(k) = ^ J2T=i z e(k) we have 



\Zi(k) - z(k)\\ 



1 m 1 m _ m 

-Y^{z % {k) - Zi {k)) <-J2\\ Zz (k)-z,(k)\\<-J2\Hk)-v e (k)\\, 

rn ' * rn ' * rn ' * 



m 



m * — ' m 

i=\ 1=1 



*i|2 
X 



where the first inequality is obtained by the convexity of the norm and the last inequality 
follows by the projection property in Eq. ©. Further, by letting v{k) = ^ Y^7=i v j{k) an d using 

\\vi(k)-Vi(k)\\ < \\vi(k)-v(k)\\ + \\vi(k)-v(k)\\, we obtain \\zi(k)-z(k)\\ < \\vi(k) -v(k)\\ + 
— YlT=i \\ v z(k) — v(k)\\ for every i E V. Upon summing these relations over i E V, we find 

m m 

J2 \\zi{k) - z(k)\\ < 2 J2 \Hk) - mi d3) 

t=l 8=1 

Combining relations (fl"3T ) and (fT2l) . and substituting the resulting relation in Eq. (fTTI) . we obtain 

m m 

£(/,(*(*)) - /,(*)) > -2G f £ IKW - v(k)\\ + (/(*(*)) - /(*))• d4) 

i=\ i=l 

Finally, by using the preceding estimate in inequality (TTOb and letting x = x* for an arbitrary 
x* e X*, we have w.p.l for any x* E X* and k >k, 

J2 E [\\xi(k) -x*\\ 2 | JFu. x ] < ( 1 + -^- ) E IW fc " !) 

~e [/(^)) - d i J-*,!] + ^ E e ii w*) - «(*) ii i Ji-i] + g^. (is) 

«=i 

Since z(/c) E X, we have f(z(k)) — f* > 0. Thus, in the light of LemmaHl relation (fl"5l satisfies 
all the conditions of Lemma[Q Hence, the sequence {||xj(/c) — x*|| 2 } is convergent for any i E V 
and x* E X* w.p.l, and J2T=o l(f(K k )) - /*) < oo w.p.l. Since ££Lo \ = oo, it follows that 

hminf(/(Xfc))-/*) = w.p.l. (16) 

fc— >oo 

By Lemma [3t a), noting that Zi(k) = \~\ x [vi(k)], we have 

lim \\vi(k) - Zi(k)\\ = for all i E V w.p.l. (17) 

k— >oo 

Since the sequence {||xj(/c) — x*||} is convergent with probability 1 for any i E V and every 
x* E X*, in view of the relations (l4al and (fTTI , respectively, so are the sequences {||vi(fc) — ccr* 1 1 } 
and {||zj(/c) — x*\\}, as well as their average sequences {H'u(fc) — x*\\} and (||^(A;) — x*\\}. 
Therefore, the sequences {v(k)} and {z(k)} are bounded with probability 1, and they have 
accumulation points. From relation (fT6l and the continuity of /, the sequence {z(k)} must 
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have one accumulation point in X* with probability 1. This and the fact that {||z(fc) — x*||} is 
convergent with probability 1 for every x* E X* imply that for a random point x* E X*, 

lim z(k) = x* w.p.l. (18) 

Now, from z(k) = ^Y17=i z ^) and *>(&) = iEw^W- usin g relation (23 and the 
convexity of the norm, we obtain lim^oo || v(k) — z(k) \\ < ^ J2T=i nm fc->oo \\ve(k) — ze(k)\\ =0 
w.p.l. In view of relation (fl"8~l) . it follows that 

lim v(k) = x* w.p.l. (19) 

By Lemma @] we have 

liminf ||vi(A;) — w(A;)|j = for alii 6 V w.p.l. (20) 

k—^oo 

The fact that {||t>j(&;) — x*||} is convergent with probability 1 for all % and any x* E X*, together 
with COS) and ([20]) implies that 

lim \\vi(k) - x*\\ = for all i E V w.p.l. (21) 

k— >oo 

Finally, by LemmaOb), we have lim^oo \\Xi(k)—Vi(k) \\ = for all i E V w.p.l, which together 
with the limit in (TUT) yields lim^oc Xi(k) = x* for all i E V with probability 1. ■ 

V. Error Bound 

Here, we prove Proposition |2] We start by providing some lemmas that are valid for a constant 
stepsize Qti(k) = on > 0. The first result shows a basic iterate relation. 

Lemma 5: Let Assumptions [2H] hold, where the stepsize satisfies Assumption BJa). Then, for 
the iterates Xi(k) of the method we have w.p.l for any x E X, and for all k > 1 and i E {Ik, Jk}, 

E [\\ Xi {k) - x\\ 2 | JF k _ x , I k , J k ] < (1 - Pl )\\ Vi {k) - x\\ 2 - 2a i (Vf l (x), z t (k) - x) + 8(1 + c)a 2 G 

where p, = er^ - 4(2 + c)a 2 L 2 and ^(/c) = ILr[fj(fc)]. 
The proof of the preceding lemma is in Appendix [D] 

Next, we provide an asymptotic estimate for the disagreement among the agents. 

Lemma 6: Let Assumptions [Q-|4] hold. Let x[k) = -^YlT=i x i(k) for all k. Then, for the 
iterates {xi(k)} generated by method (I4a1)-(l4c1). we have 

A n-yi s~\i^ f 



2^2 

/' 



4ma 2 G 2 , 



lim sup Y, E[||*i(*) ~ x(k)\\ 2 ] < jJ- 

fc^oo . =1 (1 — V A) 



c, 

2 ' 
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where C = 87( ^° { ^ 1 } +c) + 1. Pi = ^«i - 4(2 + c)a 2 L\, 7 = max^, a = max; a u and 
L = maxLj. 

The proof of the lemma is given in Appendix |EJ The bound in Lemma [6] captures the variance 
of the estimates Xi(k) in terms of the number of agents, the maximum stepsize and the spectral 
gap 1 — \/A of the matrix W. 

We are now ready to prove Proposition |2] In the proof we use a relation implied by the 
convexity of the squared-norm. In particular, by the definition of Vi(k) in (|4ak the convexity of 
the squared-norm function and the doubly stochastic weights W(k), we have for any x E M d , 

m mm m 

^EOK^-xlH^.^^^^llx^fc-^-xll^^ll^^-^-xll 2 . (22) 

1 = 1 8=1 j = l j = l 

Proof of Proposition [2] The function / is strongly convex with a constant a = YlT=i a i an ^ 
therefore, problem (J) has a unique optimal solution x*. The proof starts with the relation of 
Lemma [5] where we let x = x*. Define z(k) = ~YlT=i Zi (k), so that z(k) E X. We have 
(Vfi(x*),Zi(k) - x*) = (Vfi{x*),z{k) - x*) + (Vfi{x*),Zi{k) - z(k)), which in view of the 
gradient boundedness (Assumption E£d)) implies that 



(Vfi(x*),Zi(k) - x*) > (Vfi(x*),z(k) - x*) - G f \\ Zi (k) - z( 
Using the preceding relation and Lemma |5l we have for all k > 1 w.p.l, 

E[\\ Xi (k) -x*\\ 2 \ J- fc _i, I k , J k ] < (1 - Pi ) \\vi(k) - x*\\ 2 - 2oi{Vfi(x*), z{k) - x*} 

+ 2a i G f \\z i (k) - z{k)\\ + 8(1 + c)a 2 G). 

Taking the expectation with respect to J^-i an( l using the fact that the preceding inequality 
holds with probability 7j, and otherwise we have Xi(k) = Vi(k) with probability 1 — 7$, we 
obtain w.p.l for all k > 1 and i E V, 

E[\\xi{k) - x*\\ 2 I J-^i] < (1 - liPi )E[\\vi(k) - x*\\ 2 I J- fc _i] 
- 2 7i aiE[<V/i(x*), «(A;) - x*) I .F fc _i] + 2 li a l G f E[\\z i (k) - z(k)\\ \ F k ^] + 8(1 + c) W 2 Gj. 

We note that under the assumption that pi = a^Oi — 8(1 + c)a 2 L 2 E (0, 1) for all i, we also have 
jiPi E (0, 1) for all i since j t E (0, 1). By adding and subtracting 2 min J {7 :; a : ,}E[(V/j(x*), z{k) — 
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x*) | .Ffc_i], we find that 

E[\\ Xi {k)-x*f | JUl < (1 - liPi ) E[\\v t (k)-x*\\ 2 | F k _ x \ 

- 27aE[(V/,(x*), z(fc) - x*) \ JF k . x ] + 2A 7a E[||V/ i (z*)||||z(fc) - x*\\ | .F fc _i] 

+ 27aG / E[p i (fc)-^(fc)|| | J- fc _ 1 ] + 8(l + C ) 7 « 2 Gj, (23) 

where A 7a = maXj{7jOj} — min J {7 :; aj}, a = mirij a,\, 7 = mm^, a = maxj «j and 7 = 
maxj 7j. We can further estimate 

^ m „ m 

wvfiix^wwzi^-x^^^y^wn^viik)}-^!^ ^y; ika) -«* 11, 

where the first inequality follows by Assumption |2fd), z(k) = — Y^i=i ^x[vi{k)] and the convex- 
ity of the norm function, while the second inequality follows from the projection property ©. 
Also, from relation ab < |(a 2 + b 2 ) and the convexity of the square-function, we obtain 

1 m 
||V/*(x*)||||z(fc)-*l<-Gj + — J]|k(A;)-x*|| 2 . (24) 

i=i 
Summing relations in Eq. d23l over i, and using estimates (l22l) . (|24|) and ^™ X (V fi(x*) , z(fc) — 
£*) > f(z(k)) ~ f( x *) > (which holds by the optimality of a;*), we have 

^ E[||^(A;) - x*\\ 2 I J-,_i] < (1 - q) J2 E i\M k ~ l ) ~ x *f I -Ffc-i] + A 7Q mG 2 f 



+ 27aG/ ^ E[||^(A;) - z(fc) || | J" fc _i] + 8(1 + c)m^a 2 G 2 f . 



i=l 



where 5 = minj^p,;} — A 7Q /m. Since ^pi ^- e (0, 1) by Assumption BJb), it follows that 

q E (0, 1), and therefore 

m 1 m 

limsup \^ E[||xj(/c) — x*|| 2 ] < 2^/aGf- limsup 2_] E[||zj(/c) — z(fc)H] 

k— >oo . -, y fc— >oo . -. 

1=1 t=l 

+ - (A 7a mGj + 8(1 + c)m7a 2 Gj) . (25) 

We now consider the sum Y^i=i E[||^i(/c) — z(fc)ll]- Using Holder's inequality, we have 

in 
X)E[||^(fc)-^)||] < 



t=i 



,n^E[||^(A;)-z(A;)|| 2 ]. (26) 
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Since z(k) = ± YZLi n #k(&)], it follows that for v(k) = ± YZLi v ^ k ^ 



J2 m*iW - Km 2 ] < E E 0W fc ) - n *[«(*)iii a ] < E E [ii«i(*) - ^( fc )ii 2 ]' < 27 ) 



j=l 



i=l 



i=l 



where the last inequality is obtained from the projection property ©. Since v(k) is the average 
of Vj(k) for j e V, it follows that for x(k — 1) = ^ Y^jLi x i( k ~ -0> 



E E DK*) 



2 ]<EE[lk(fc)-^-i)H 2 ]. 



i=l 



i=l 



From the preceding relation, and using Eq. (|22l with x = x(& — 1) (where we take the total 
expectation), we find that 



J2^[\Hk)-v(k)\\ 2 ]<^[\Mk-i)-x(k-i)\\ 2 ]. 



(28) 



i=l 



ad 



From Eqs. (l26T)-(l2~8T) and Lemma |6l we obtain 

m 

limsupE E [ll 2 i( fc )-^( fc )ll] <2mv / C " W . 
The result follows from Eq. (125b after dividing by m. ■ 



VI. Simulations: Distributed Robust Control 

In this section, we apply our GRP algorithm to a distributed robust model predictive control 
(MPC) problem J6J. A linear, time-invariant, discrete-time system is given by the following state 
equation for t — 1, . . . , T, 

x(t) = Ax(t-1) + Bu(t), (29) 



where 



.4 



1 1 


, b = 


0.5 


1 




1 



with initial state x(0) = [7, 0]'. The goal of the agents on the network is to find an optimal 
control u = [u(l), . . . , u(T)]' of system (1291) over time t — 1, . . . , T, with some random terminal 
constraints. The distributed optimization problem is given by 



lin /(u) = V fi(u) s.t. u e X, 



mm 



(30) 



4 = 1 
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where 

T 

/i(u) = ^2 \\ x (t) - z i\\ 2 + ru(t), for z = 1,. . .,m, 

t=i 

is the local objective of agent % and r > is a control parameter. Hence, the agents on the network 
jointly find a control u such that the resulting trajectory x(t), for t — 1, . . . ,T, minimizes the 
deviations from the points Zi G M? together with the control effort. The information about the 
points z^ for % — 1, . . . , m, are private and only agent i knows the location Zi. 

The constraint set X is a set of control inputs that satisfies the following constraints. 

Kt)||oo<2, fort = l,...,T, (31a) 

xit) = Ax{t - 1) + Bu(t), for t = 1, . . . , T, (31b) 

max Ua e + ^)'x(T) - bA < 0. (31c) 

€=1,2,3,4 

The system is initiated in state x(0) = [7, 0]'. The constraint (131 al l is just a box constraint, 
while the constraints in (|31bl ) describe the system dynamics. The constraints in (13 lcb describe 
the random terminal conditions given by the linear inequalities (ai + 5e)'x(T) < b^ and the 
perturbations bi are uniform random vectors in boxes ||<^||oo < A for some given scalars /?£. 
Note that u(t), for t = 1, . . . , T, are the only variables here since x(t), for t = 1, . . . , T, are 
fully determined by state equations (13 lbb once u(t), for t = 1, . . . , T, is given. 

For this problem, we have Xi = X for all i. The constraint set X is uncertain and not exactly 
known in advance since the perturbations are uniform random vectors in boxes. To apply the 
GRP algorithm (l4al)-(l4cl) in solving this robust optimal control problem, at iteration k, each 
agent I k and J k draws a realization of one of the linear inequality terminal constraints, and each 
of them projects its current iterate on the selected constraint. Subsequently, they perform their 
projections onto the box constraint (131 al l. 

Since the uncertainty exists in a box, the problem (1301 ) has an equivalent Quadratic Program- 
ming (QP) formulation. Note that the following representations are all equivalent: 

(a e + Si)'x{T) < b ei \/(5 e : H^IU < ft) (32a) 

& max S' f x(T) < b e - a' f x(T) (32b) 

|N|oo<A? 

^ a',x{T) + (3 i \[x{T)] 1 \+ ^WxiT)^ <b,. (32c) 
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Therefore, the inequality (131 el l admits an equivalent representation of (132cb by a system of linear 
inequalities with additional variables t x and t 2 : 

-*i<bCO]i<*i, for J = 1,2, (33a) 

max {a^xfT) + /^ + /% - &«} < 0. (33b) 

^=1,2,3,4 

This alternative representation is only available since we are considering simple box uncertainty 
sets for the sake of comparison. Note that our GRP algorithm is applicable not just to box 
uncertainty but to more complicated perturbations such as Gaussian or other distributions. 

In the experiment, we use m = 4 and m = 10 agents with T = 10 and r = 0.1. We solve the 
problem on three different network topologies, namely, clique, cycle and star (see Figured). For 
the agent selection probability, we use uniform distribution, i.e., at each iteration, one of the m 
agents is uniformly selected and the selected agent uniformly selects one of its neighbors. Table H 
shows the second largest eigenvalue A of W for the three network topologies when m = 4 and 
m = 10. When m is larger, we can see that A is very close to one for all of the three cases. 

We evaluate the algorithm performance by carrying out 100 Monte-Carlo runs, each with 
40,000 iterations for m = 4 and 100,000 iterations for m = 10. For the stepsize, we use either 
a diminishing one (1/Ti(k)) or a constant a» = 10~ 5 for m = 4 and a* = 10~ 6 for m = 10. 

In Figures [2] and El we depict i J2Zi IM*0 ~ u *ll 2 over 40,000 and 100,000 iterations 
when the diminishing and constant stepsize are used, respectively. The optimal solution u* was 
obtained by solving the equivalent QP problem (i.e., problem (|30l with constraints (|31a| )- (|31bl ) 
and (I33al) - (l33bl) ) using a commercial QP solver. 

We can observe for both cases that the errors go down fast. An interesting observation is that 
the network topology does not affect the algorithm performance when the diminishing stepsize 
is used. When the constant stepsize is used for the m = 4 case, star network converges much 
slower than the other two networks. This is because the agent selection probability ji is different 
for the center node and the peripheral nodes. As the bound in Proposition |2] captures, a more 
aggressive stepsize a, should have been used for the peripheral nodes. For the m — 10 case, 
however, the difference is not as clearly visible as in the m = 4 case. This can be explained by 
the almost the same spectral gap 1 — y/X (as shown in Proposition |2] and Table IB- 
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TABLE I 

Number of agents and A 



m 


Clique 


Cycle 


star 


4 


0.6667 


0.7500 


0.8333 


10 


0.8889 


0.9809 


0.9444 



4- <> X 



Fig. 1. Clique (left), cycle (center) and star (right) graph used for communication topology (m = 4) 

VII. Conclusions 

We have considered a distributed problem of minimizing the sum of agents' objective functions 
over a distributed constraint set Xi. We proposed an asynchronous gossip-based random projec- 
tion algorithm for solving the problem over a network. We studied the convergence properties 
of the algorithm for a random diminishing stepsize and a constant deterministic stepsize. We 
established convergence with probability 1 to an optimal solution when the diminishing stepsizes 
are used and an error bound when constant stepsizes are used. We have also provided a simulation 
result for a distributed robust model predictive control problem. 
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Appendix 

A. Proof of Lemma [2] 

We begin with a lemma which provides some basic relations for a vector x G y, an arbitrary 
point z E M. d , and two consecutive iterates x and y of a projected-gradient algorithm. The 
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auxiliary point z will be used to accommodate the iterations Vi(k) of the GRP method which 
may not belong to the constraint set X, while x will be a suitably chosen point in X . 

Lemma 7: Let y C M. d be a closed convex set. Let the function <\> : M. d — > M. be convex and 
differentiable over W. d with Lipschitz continuous gradients with a constant L. Let y be given by 
y = Uy[x — aV(f)(x)) for some x E W 1 and a > 0. Then, for all a; G y and z <G M d , we have: 

(a) For any scalars tx,t 2 > 0, 

3 

\\y - xf < (1 + 8a 2 L 2 )||x - x\\ 2 - 2a {<p{z) - <p(x)) - -\\y - x\\ 2 

+ (8 + r 2 )a 2 \\V(p(x)\\ 2 + T ia 2 L 2 \\z - xf +(- + -) \\x - z\\ 2 . 

V r l T 2/ 

(b) In addition, if is strongly convex on R. d with a constant er > 0, then for any ti,t 2 > 0, 

3 

\\y — %\\ 2 < (1 — «cr + 8a 2 L 2 )\\x — x\\ 2 — 2a(V0(x), z — x) — -\\y — x\\ 

+ (8 + r 2 )a 2 \\V(p(x)\\ 2 + na 2 L 2 ||z - xf +(- + -) \\x - z\\ 2 . 

Proof: For part (a), from the relation defining y and the strictly non-expansive projection 
property in ©, we obtain for any ie}', 

\\y — x|| 2 <||x — x|| 2 — 2a(V0(x),x — x) — \\y — x\\ 2 + 2a(V0(x),x — y). (34) 

We next estimate the term 2a(V0(x),x — y). By using Cauchy-Swartz inequality we obtain 
2a(V(f)(x),x-y) < 2a\\V (j)(x)\\\\x-y\\. By writing 2a\\V(j)(x)\\\\x-y\\ = 2(2a||V0(a;)||)(||a;- 
y||/2), we find that 

2a{V(f)(x),x-y) < 4a 2 ||V0(x)|| 2 + -\\x - y\\ 2 . (35) 

Furthermore, we have ||V0(x)|| 2 < ||(V0(x) — V0(x)) + V0(x)|| 2 , which by the square-function 
property (a + bf < 2(a 2 + b 2 ) yields ||V0(x)|| 2 < 2||V0(x) - V0(x)|| 2 + 2||V0(x)|| 2 . The 
preceding relation and the Lipschitz gradient property of imply 

||V0(x)|| 2 <2L||x-x|| 2 + 2||V0(x)|| 2 . (36) 

Therefore, from (l34l- (|36T ) we obtain 

\\y-x\\ 2 < (l + 8a 2 L 2 )||x-x|| 2 -2a(V0(x),x-x) - -\\y - x\\ 2 + 8a 2 ||V0(x)|| 2 . (37) 
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Next, we estimate the term 2a(V0(x),x — x) using the convexity of </>, 

(V0(x),x - x) > <f>{x) - <f>{£) = (0(x) - 0(2)) + {<f>(z) - <f>{x)) , (38) 

where z G M d is some given point. It remains to bound the term <p(x) — (f)(z), for which by 
convexity of <\> we further have 

<f>{x) -<j>(z) > (V(j)(z),x-z) > -||V0(^)|| ||x-z||. 

By writing ||V0(z)|| < ||V^(z) — V0(x)|| + ||V0(x)|| and using the Lipschitz-gradient property 
of (j), we obtain 

4>(x) — <f)(z) > —L\\z — x\\ \\x — z\\ — ||V0(x)|| ||a; — z\\. 

Multiplying the preceding relation with 2a and using 2aL\\z — x\\ \\x — z\\ = 2{a^/r\L\\z — 
x\\)(\\x-A\/^/tx) < ntftfWz-xf+Wx-zf/n, 2a||V0(x)|| \\x-z\\ = 2(a^||V0(x)||)(||x- 
z\\/\/t2) < r 2 a 2 ||V0(x)|| 2 + \\x — z\\ 2 /t 2 for some ri,r 2 > 0, we obtain 

2a (0(x) - <f>{z)) > -r ia 2 L 2 \\z - xf - r 2 a 2 ||V0(x)f - ( - + -) \\x - z\\ 2 . (39) 

Thus, from Eqs. (I37l)-(l39l it follows that 

3 

\\y-x\\ 2 < (1 + 8a 2 L 2 )\\x - x\\ 2 - la (<j)(z) - <f>(x)) - j\\y - x\\ 2 

+ (8 + r 2 )a 2 ||V0(x)|| 2 + T X a 2 L 2 \\z - xf +(- + -) lb - zf, (40) 

thus proving the relation in part (a). The relation in part (b) follows similarly by using the strong 
convexity of in Eq. (l38l . i.e., (V(j)(x),x — x) > 4>(x) — 4>(x) + f ||x — xf for all x, x G M. d . ■ 

The proof of Lemma |2]relies on Lemma Ufa) and the fact that the event Ei(k) = {i G {h, Jk}} 
that agent i updates at any time is independent of the past. Due to this, the number of updates 
that any agent i has performed until time k behaves almost as 1/k when k is large enough. The 
long term estimates for the stepsize ai(k) = p4^y in terms of the probability 7, that agent i 
updates are given in the following lemma. 

Lemma 8: (see ifTTl ) Let oii(k) = 1/Ti(k) for all k > 1 and i eV. Let 7r mhl = mm^j^E^ij- 
Also, let q be a constant such that < q < 1/2. Then, there exists a large enough k (which 

depends on q and m) such that with probability 1 for all k > k and i G V, 

2 4m 2 

(a) Qi(fc) < -j— , (b) a 2 (k) < 



H' w %y '- fc 2 (lH-7r min ) 2 ' 
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(C) 



<k) ~ ^ 



< 



According to this lemma, the stepsizes ati(k) exhibit the same behavior as the deterministic 
stepsize 1/k in a long run. The result is critical for dealing with the cross dependencies of the 
random stepsizes and the other randomness in the GRP method. 

Proof of Lemma |2l Consider i G {Ik, </&}, and use Lemmata) with the following identifi- 
cation: y = X? m and x G X C X? l{k) , y = Xi (k), x = v t (k), z = z t (k) = U x [vi(k)], <\> = f it 
and a = Oi(k). Then, for any x G X and k > 1, 

\\xi{k) - x\\ 2 < (1 + 8a 2 (k)L 2 )\\ Vi (k) - x\\ 2 - 2a t {k) (/*(*(*;)) - /,(*)) - ~\\xi(k) - Vi (k)\\ 2 

+ (8 + r 2 )a?(*)||V/«(*)|| 2 + r l0 i{k)L*\\zi{k) - x\\ 2 + ( - + -) \\ Vl (k) - Zi (k)\\ 2 , 

By Assumption (2d), we have ||V/j(cc)|| < Gf. Further, we let T\ = r 2 = 4r/ for some rj > 0, 
and by using Lemma Mb) we find that w.p.l for all k large enough, 

\\xi(k) - x\\ 2 < (l + §) \\vi{k) - x\\ 2 - 2a % {k) (fi(zi(k)) - fc(x)) 

- - A \\ Xi {k) - Vi (k)\\ 2 + |§ + j±\\zi(k) -x\\ 2 + i||«,(fc) - z^W 2 , (41) 

where c x = fff^ , L = max^Li, c 2 = ^^^ and c 3 = j^ . Next, consider 
2ai(fc) (fi(zi(k)) — fi(x)), for which we can write 

1 



2ai(k) (fi(zi(k)) - /,(*)) > ^- {h{zi{k)) - /«(*)) - 2 



a<(ifc) H 



\fi(zi(k)) - /,(*)! . 



Since /$ has bounded gradients over the set A\ it is Lipschitz continuous over A\ Thus, since 
Zi(k),x G A\ it follows that \fi(zi(k)) — fi(x)\ > Gf\\zi(k) — x\\. This and Lemma H^c) imply 

2a t {k) {fi{zi{k)) - /«(*)) > A (/<(*(*;)) - /,(*)) - 2 2 -G7||*(fc) - £|| 

fc 7i fc3"«(l + vr min ) 2 

> l~ mm - m) - p— -^ — - (g? + u*(*) - *u 2 ) , 

fc 7i fc2 9 (l + 7T min ) 2 

where the last inequality follows by the Cauchy-Schwarz inequality. Combining the preceding 
relation with Eq. (l41~l) . we obtain w.p.l for k large enough 

Mk) -xf < (l + ^) |h(fc) - x\\ 2 - -^ (/<&(*)) - /,(*)) - |||xi(fc) - ^)|| 2 

C 2 , C 4 , /^C 3 , C 5 "\ „ /? ^ 2 1 2 



where c - = wt^ G ) and Cs = wb&- 
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By the definition of the projection, we have \\vi(k) — Zi(k)\\ = dist(vi(k),X), and 

WxiW-Viim > \\n x p iW [ Vi (k)] -Vi(k)j = dist( Vl (k),X^ k) ). 

Taking the expectation in (|42|) conditioned jointly on J-'k-i, h and Jk, we obtain for any x G X, 

i E {Ik, Jk} w.p.l for all k large enough we can see that 

2 
lik 



E [(MA:) - x\\ 2 | Jfc„i, 4, J fc ] < (l + S) IKW - zf - 4l (/i(*(*0) - /<(*)) 



-^ 



dist 2 (ui(fc),^ 



fii(fc)> 



Ui(fc) 



+ 



fc : 

C 6 , C 7 ii.. ^ -||2 , 



7T-; + 7^—11^(^-^11 +^-dist (vi(k),X), 
kz q b q /? 7 



where c 6 = c 2 + c 4 and c 7 = c 3 + c 5 . Using the regularity condition (Assumption |3), we have 



dist 2 (ui(fc),^ 



£M*0> 



Vi(k) >-dist 2 (vi(k),X). 
c 



Thus, by letting rj = c, from the preceding two relations we have w.p. 1 for all k large enough 

E [\\ Xi (k) -x\\ 2 \ F k _ u 4, J fc ] < (l + §) lk(*0 - *f - -^ (/i(*(*0) - /i(*)) 



-ldist 2 K(A;), AT) + #- + ^\\z t (k) - x\\ 2 . 

4:C 



k2~1 k" q 

The preceding inequality holds with probability 7j (when agent i updates), and otherwise Xi(k) = 
Vi(k) with probability 1 — 7, (when agent i does not update). Hence, w.p.l for any x 6 X, all 
i EV, and all fc large enough we have 



E[||x i (A:)-i|| 2 I J=- fc _J < (l- 
-^E[dist 2 (^(fc),^)| JT^J 



7i«i 



E [^(A;) - x|| 2 I J- fc _!] - -E [£(*(*;)) - /i(x) I .Ffc-i 



C 6 7* c 7 



fci" 9 fci-9 



E[||^(A;)-x|| 2 | J- fe -i]. 



Since 7, < 1, the relation of Lemma |2] follows by letting ai = Ci, a 2 = c 6 and a 3 = c 7 . 



B. Proof of Lemma \3\ 

The proof of this lemma and the proofs of the other lemmas, often, rely on the relations 
implied by the convexity of the squared-norm. In particular, by the definition of Vi(k) in (l4al) . 
the convexity of the squared-norm function and the doubly stochastic weights W(k), we have 
for any x E~$l d , 



E E [M*) 



— X 



i=l 



^i]<I]£w*i(*-l) 



• *" 






-1 -x 



(43) 
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Similarly, by the convexity of the distance function x h-» dist (x, X) (see [|2l p. 88]), we have 

m m 

J2 E [dist 2 (^(fc), X) | J" fc _J < ^ dist^x^A; - 1), X). (44) 

i=i i=i 

Proof of Lemma [3j To prove part (a), we start with Lemma |2] where we let x = Zj(A;) = 

n^[^i(^)]- Then, for all k large enough and all % E V, we obtain w.p.l, 



E Nix,- 



(A;) — n^[^(fc)]|| 2 I J^c-l] < (l + ^)E[dist\v i (k),X)\T k . 1 ] 

-^E [dist 2 (^(A:), X) | JF^} + -^, (45) 
4c k'i q 

where q E (0,1/2). By the definition of the projection, we have dist(xi(k),X) < \\xi(k) — 
^x[ v i{k)]\\- Using this relation in Eq. (1451 and, then, summing the resulting relations over % and 
applying Eq. (l44l) . we find that w.p.l for all k large enough and all i E V, 

m m 

^2E[east 2 ( Xi {k),X)\ F k _ x ] < (l + ^^Tdist 2 ^-!),*) 

m 

-f E E [dist 2 (^(fc), #) | Tik-J + ^, (46) 

1 = 1 

where 7 = min; 7*. Therefore, for all k large enough, the conditions of Lemma [Hare satisfied (for 
a time-delayed sequence), so we conclude that Y^k=i ^ [dist 2 (vj(/c), X) \ Fk-i] < 00 for all i. 
Taking the total expectation in relation (l46l . it also follows that YlT=i E [dist 2 (vi(k) , <Y)] < 00 
for all i E V, which by the Monotone Convergence Theorem [|27l p.92] yields lim^oo dist(vi(k), X) 
for all i w.p.l, showing the result in part (a). 

For part (b), note that for ||ej(/c)||, using Zi(k) = U x [vi(k)], we can write for i E {h, Jk}, 



< \\ Xi (k) - Zi (k)\\ + \\z i (k)-v i (k)\\ 

U n lW [vi(k) - ai (k)Vfi(vi(k))} - Zi(k) + \\ Zi (k) - Vi(k)\\ . 

i 

Since X C X t and Zi(k) E X, we have Zi(k) E X i . Using the projection non expansive- 
ness property of Eq. ©, we obtain 

||e<(A;)|| < \\vi(k) - ai(k)Vfi(vi(k)) - z % (k)\\ + \\ Zl (k) - Vi (k)\\ 
<2\\v i (k)-z i (k)\\ + a i {k)\\Vf i (v i {k))\\. 
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Further, from the Lipschitz gradient property of /, and the gradient boundedness property 
(Assumptions [2tc) and Htd)) it follows that 

IK*) || < 2|K(A;) - zi(k)\\ + Oi(k) (\\VMvi(k)) - Vfi(zi(k))\\ + \\Vfi(zi(k))\\) 

< (2 + OiftLi) distort;), A') + o^G/, (47) 

where the last inequality follows by a* (A;) < «i(l) and ||i>i(fc) — ^(fc)|| = dist(i>j(A;), X). Using 
the Cauchy-Schwartz inequality and Lemma Eta) (i.e. cti(k) < 2/ (£7$)), we have for all i 6 
{4, J fc } and fc > A;, 

lh(A;)|| 2 < 2(2 + a^L^dist^v^k), X) + ^G 2 f , (48) 

where we also use 7« > — . Taking the expectation in (|48l conditioned on J-^-i, 4, ^fe and noting 
that the preceding inequality holds with probability ji, and a;,- (A;) = Vi(k) with probability 1 — ji, 
we obtain with probability 1 for all k > k and i E V, 

E[|| ei (fc)|| 2 | F k . x \ < 2 ll (2 + a l (l)L) 2 E[dist 2 (v l (k),X) | F k _,\ + ^^G 2 f , 

where L = maxjLj. By part (a) of this lemma, we have J2T=i E[dist 2 (vj(/c), X) | J-fc_i] < oo 
w.p.l for all i. As X^fcli p < oo, we conclude that J2T=i E[||e;(fc)|| 2 | J^k-i] < oo for alH 6 V 
w.p.l. Furthermore, by relation (|48T i and part (a) of the lemma we find that lim^oo ||ej(A;)|| = 
for all i w.p.l. ■ 

C. Proof of Lemma |4] 

The proof of this Lemma makes use of an additional result, which is given below. 
Lemma 9: Let {M^A;)} be an iid sequence of m x m symmetric and stochastic matrices. 
Consider a sequence {9(k)} C M. m generated by the following dynamics 

6(k) = W(k)6(k - 1) + e(k) for k > 1. (49) 

Then, we have with probability 1 for all k > 1, 

E[||A(*)|| | ^-x] < >/A||A(* - 1)|| + E[||e(A0|| | F k _ x \. 

where A(/c) = 6*(/c) — — 11 T 0(A;) and A < 1 is the second largest eigenvalue of W = E [W(A;)]. 

Proof: Define the sequences of averaged coordinate values as 6^ vc = ■^YllLi^ii^) an d 

e fc ve — mY^iLi e i(k)- From relation (|49l , by taking averages over the coordinates, we have 
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#avc = ^a™ + e avc Usmg j £ R m & vectQr wkh gjj ks e l ements ^ we can write flavej = 

0^1 + ef c l, or equivalent^, 

— U T 8(k) = — U T 6(k - 1) + —ll T e(k). (50) 

mm m 



From equations (1491 ) and (1501) . we have: 

6(k) - -ll T 9(k) = (w(k) - -11 T ^) 6(k -l)+(l- -11 T ^) e(Jfc). 
m \ m J \ m J 

Since (W(k) - ^11 T ) ±U T 6(k - 1) = (W(A;) - ill r ) 0^1 = 0, by letting A(A;) = d(k) - 

±ll T 6(k), D k = W(k)-±11 T and M = I-^H T , it follows that A(fc) = D fc A(ife-l)+Afe(ife) 

for all fc > 1. By taking the norm and the expectation conditioned on the history T k -\, from 

the preceding relation we have w.p.l for k > 1, 

E[||A(fc)|| | J-,_!] = E[||D fe A(A; - 1)|| | 7U] + E[||Me(A;)|| | J^-i]. (51) 

From Eq. © and the fact that W(k) is independent of the past Tt-i, we obtain E[||_D fc A(fc — 
1)|| 2 | J^k-i] < A||A(A; — 1)|| 2 , where A is the second largest eigenvalue of the matrix W. Using 
E[||x||] < v/E[||x|| 2 ], we obtain E[\\D k A(k - 1)|| | F k -i] < VX\\A(k - 1)|| for all k > 1. For 
the second term on the right hand side of (I5TT ). we have E[||Me(fc)|| | J-fe-i] = E[||e(fc)|| | J-'k-i], 
since M — I — -11 T is a projection matrix and, thus, ||M|| = 1. ■ 

Proof of Lemma \4\ We consider coordinate- wise relations by defining the vector yi{k) G M m 
for t — 1, . . . , d such that [ye(k)]i = [xi(k)]i for all i. From algorithm (|4ab-(|4cT>, we have 

y e (k) = W(k)y e (k-l) + 5e(k) for k > 1, 

where ^(/c) G M m is a vector whose coordinates are defined by 

Mk)]i = \n n dh) [ Vi (k) - ai{k)Vf{ Vl {k))} - Vi{k)] if i E {4, J k }, (52) 

and otherwise [<^(fc)]j = 0. Since the matrices W(k) are doubly stochastic for all k > 1, from 
Lemma [9] we obtain 

E[\\y e (k) - [x(k)] e l\\ \ F k -i] < V\\\yt{k - 1) - [x{k - 1)],1|| + E[||fc(fc)|| \ JU], (53) 

where [x(/c)]^ = — l T yi(k) and A < 1 by Assumption [TJ 

We next consider Si(k) as given by (|52l , for which we have for all k > 1, 



II Wf < 5Z |K niW K(fc) - OiWiVfiiviih)))] - Vi (k) 



2 



April 8, 2013 DRAFT 



29 



Letting Zi(k) = n^^^fc)], observing that Zi(k) e ^ i() and using the projection property in 
Eq. ©, we obtain 

IIM*0II 2 < E ( n ^w[^( fc ) -Oi(*)v/i(^(fc))] -*(fc) + Iki(fc) -^(fc)||) 

i=l 

III 

< J2 («*(*) l|V/i(vi(fc))|| + 2||*(A;) - Wi (A;)||) 2 . 

i=l 

Applying the Cauchy-Schwartz inequality, we can obtain 

Til 

mm 2 < E ( 2a '( fc ) iiWi(«i(A;))ii a + 4||*(ao - ^aoii 2 ) . 

The term || V fi(vi(k)) || 2 can be further evaluated by using the Lipschitz property and the bounded 
gradient assumption (Assumption |2fd)), 

\\^fi(vi(k))\\ 2 < 2||V/ i (« i (A;)) - V/ifoCAO)!! 2 + 2||V/,(z 4 (fc))|| 2 < 2L?|K(fc) - ^(A;)|| 2 + 2G 2 . 

From Lemma (8fb), there exists a large enough fe such that a 2 (k) < Am 2 /k 2 < Am 2 /k 2 w.p.l 
for all k > k. Therefore, noting that \\zi(k) — Vi(k)\\ = dist(vi(k), X), we obtain for all k > k 
with probability 1, 

\Mk)\\ 2 < U+^lA f>t 2 (^),*) + ±^G 2 , 



with L = maxjLi. Taking the expectation with respect to J^-i and using E[||x||] < \J E[||a;|| 2 ], 
we obtain E[||<^(fc)|| I J-'k-i] < b k , where 



N 



4 



16m 2 - 



k" 2 



EEfdist 2 ^^),^)! JU] 



16m 2 2 

^ 2_G /- 



From the preceding and relation (|53l ). we obtain for all fc > fc with probability 1, 

^E[\\y t (k) - [x{k)]il\\ | J-fc-i] < j^ylM* - 1) - [s(fe - i)]a|| 

- ^-j^llwC* - 1) - [*(* - WH + ]** (54) 

Noting that \b k < (l/k 2 + b%\)/2, and that Y.T=i b t < oo by Lemma |Ja), the term \b k is 
summable. From this and the fact that 1 — \/\ > 0, relation (l54l satisfies all the conditions in 
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Lemma [Q It follows that J2T=i f E[||^(&) — [ x (^)kl|l I -^fc-i] < °° w i m probability 1 for any 
£ = 1, . . . , d. This and the definition of ye(k) implies that with probability 1 

V T E[\\xi(k) - x(k)\\ | J" fe _i] < oo for all i G V, (55) 

fe=i 

where x(A;) = X^Li^jC^)- Next, consider ||vi(fc) — v(A;)||. Since Vi(k) = YlT=i[W (k)]ij Xj(k — 1) 
(see d4al )) and W(A;) is doubly stochastic, by using the convexity of the norm, for v(k) = 

■k T!?=\ v A k ) we can see that E™i lk(*0 - H k )\\ < E7=i W x i( k - 1 )- %( k - VW- B y usin § 

relation (l55t . we conclude that J2T=i j^-[\\ v i( k ) ~ v( k )\\ I -^fc-i] < oo for alii G V" w.p.l. ■ 

D. Proof of Lemma \5\ 

Let i G {Ik, Jk}- Then, using the definition of the iterate Xi(k) in (|4ab-(|4cT)., and Lemma [TJl)) 
with the following identification: y = X i ^ ', y = Xi(k), x = Vi(k), z = Zi(k) = Ux[vi(k)], 
a = oti, x = x G X ', <p = fi, L = Li and T\ = r 2 = 8c, we obtain 

\\xi{k) - xf < (1 - anxi + 8a, 2 Li) \\ Vi {k) - xf - 2a i (V/ i (x), Zi {k) - x) 

- -\\ Xi {k) - v t (k)\\ 2 + 8(1 + c)a 2 ||V.ft(x)|| 2 + 4ca 2 L 2 ||^(fc) - x\\ 2 + -^|K(fc) - Zi (k)\\ 2 . 
4 4c 

By Assumption [2d), we have ||V/i(x)|| < Gf, while by the non-expansiveness projection 
property we have \\zi(k) — x\\ < \\vi{k) — x\\. Furthermore, \\vi(k) — Zi(k)\\ = dist(vi(k),X) 
since Zi(k) = r\ x [vi(k)]. Therefore, for all k > 1 and i G {Ik, Jk}, 

\\Xi(k) - x\\ 2 < (1 - am + 4(2 + c)a 2 L 2 ) \\ Vi (k) - xf - 2 ai (Vf(x), Zi (k) - x) 



+ 8(1 + c)a 2 G 2 - -\\xi{k) - v t (k)\\ 2 + — dist 2 { Vi (k), X). (56) 

1 4 4c 



3„ „,„, 1 

By the definition of Xi(k), we have Xi(k) G X i , which implies 

E[\\ Vi (k) - Xi(k)\\ | F k -i,h, J fc ] > E[distK(fc),^ i(fc) ) | JV-1,4,4]- 
By Assumption [3] it follows 



f 2/ , (i\ v 0.i{k)^ 



for all z. 



dist^fc),*) < cE [dist^fc),^ iW ) | F k - h I k ,J k 

Therefore, the sum of the last two terms in Eq. (l56l is negative and by dropping that term, we 
obtain the following relation w.p.l for all k > 1 and i G {h, Jk}, 

E [\\ Xi (k) - xf | F k -x, h, Jk] < (1 - Pi) \\vt{k) -x\\ 2 - 2ai(Vfi{x), Zi {k) - x) + 8(1 + c)a 2 G 2 , 

where p, = ctj«j — 4(2 + c)a 2 L 2 . ■ 
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E. Proof of Lemma \E\ 

In the proof of Lemma [6l we use the following result (see Lemma 3.1 in ll25l for its proof). 

Lemma 10: If \im. k ^. 00 'y(k) = 7 and < /3 < 1, then lim^oo J2e=o@ k ~ e 'y(£) = jhj- 
In addition, we also use an asymptotic upper bound for the distance between the iterates Xi(k) 
and the set X, which is given in the following lemma. 

Lemma 11: Let Assumptions |2HU hold, where the stepsizes «j satisfy As sumption 0J a). Then, 
for the iterates x t (k) of the method, we have 

lim sup VE[dist 2 (xi (&),*)] < 8(1 % ^ jtfGl, 

where pi = a^ai — 4(2 + c)a 2 L 2 . 

Proof: We use Lemma [5] with x = r\ x [vi(k)], so that w.p.l for all k > 1 and i G {Ik, Jk}, 

E [\\ Xi (k) - Zi(k) f I T k -x, I k , Jk\ < (1 - Pl )dist 2 (v t (k), X) + 8(1 + c)aJG), 

with pi = aiCXi — 4(2 + c)a 2 L 2 . We note that dist(xj(fc), Af) < \\xi(k) — Zi{k)\\. Thus, we have 
w.p.l for % G {Ik, Jk] and fc > 1, 

E[dist 2 (x i (A;), X) I Tk-x, h, J k ] < (1 - Pi) dist 2 K(fc), AT) + 8(1 + c)« 2 G 2 . 

The preceding relation holds with probability 7$ and, otherwise, Xi(k) = Vi(k) with probability 
1 — 7j. Thus, w.p.l for all k > 1 and i G V, 

Efdist 2 ^),*) I J^-i] < (l~7iPi) E [dist 2 ^),*) I F k -i] + 8(1 + c) 7i a 2 G 2 f . 
By summing over z and using relation (l44l . we obtain 

m m 

^ E[dist 2 (xi(A;), A 7 ) | .F fc _i] < (l - mm{~f iPi }) J^ dist 2 (x i (A; -1),X) + 8(1 + c)m7a 2 Gj, 
i=i % j=i 

where 7 = maxj7j and a = maxjaj. Note that when pi G (0, 1) for all i, then we also have 
7iPi G (0, 1) since % G (0, 1) for all i. Taking the total expectation and, then, applying Lemma ITOl 
we obtain the desired relation. ■ 

Proof of Lemma HI We consider coordinate-wise relations similar to the proof of Lemma |4] 
Since the matrices W(k) are doubly stochastic for all k > 1, from relation (1511 with ||M|| = 1 
and Holder's inequality, we obtain 
j 



^E[||^)-[x(^l|| 2 ]< 



t=\ 



J2^[\\Myi(k-i)-[x(k-i)} e i 



\ i=\ \ e=i 



E E [iiwn 
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where [x(fc)k = ^l T yi(k), D k = W(k) — ^H T , and A < 1. From relation ©, we know that 

d d 

J2 E [\\D k (yt(k - 1) - [x(k - l)] e l)\\ 2 ] <XJ2 E [HW(* - X ) " [*(* " 1 )^ 1 ll 2 ] • ( 5g ) 



£=1 



£=1 



The second term in (1571 ) is evaluated similar to that in Lemma |4] Hence, for all k > 1 w.p.l, 

(59) 



Ve[||maOII 2 ]<A 

\ £=1 



where ^ = \/(4 + 4a 2 L 2 ) Y^T=i ^ dist {vi(k), X)\ + 4ma 2 G 2 r, a = max; «j and L = max; L t . 
Letting u k = y X^=i E[||^(^) — [x(/c)]^l|| 2 ] in (|57|) and using relations (1581) and (l59l , we have 

Wfc < vAwfc_i + /3fc for all k > 1. 

Since A < 1, by Lemma [101 we have lim sup fc _ >00 u k < limsup fc _ >00 /3 fc /(l — vA), implying that 

1 



limsupw fc < 



k— >oo 



;i - v/a) ; 



lim sup (3 k . 



(60) 



k— >oo 



By relation (1441) it follows that 

m 

lim sup $ < (4 + 4a 2 Z 2 )limsup^E[dist 2 (x i (fc-l),A')]+4ma 2 G' 2 f . (61) 



k— >oo 



fc— »oo 



i=i 



Finally, by the definition of ye(k), we have u\ = YlT=i E[||a: i (A;) — x(fc)|| 2 ]. The desired relation 
follows from Eqs. (l60l) and (loTb and Lemma [TTI ■ 
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